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Abstract 

We have developed a new simulation method to estimate the distance between the native state 
and the first transition state, and the distance between the intermediate state and the second 
transition state of a protein which mechanically unfolds via intermediates. Assuming that the 
end-to-end extension IS.R is a good reaction coordinate to describe the free energy landscape 
of proteins subjected to an external force, we define the midpoint extension Ai?* between two 
transition states from either constant-force or constant loading rate pulling simulations. In the 
former case, Ai?* is defined as a middle point between two plateaus in the time-dependent curve 
of Aii, while, in the latter one, it is a middle point between two peaks in the force-extension 
curve. Having determined Ai?*, one can compute times needed to cross two transition state 
barriers starting from the native state. With the help of the Bell and microscopic kinetic theory, 
force dependencies of these unfolding times can be used to locate the intermediate state and to 
extract unfolding barriers. We have applied our method to the titin domain 127 and the fourth 
domain of Dictyostelium discoideum filamin (DDFLN4), and obtained reasonable agreement with 
experiments, using the Cq-Go model. 



I. INTRODUCTION 



Despite numerous advances achieved within recent years deciphering the free energy 
landscape (FEL) of biomolecules remains a challenge to molecular biology. The most detailed 
information on the FEL may be gained from all-atom simulations, but this approach, due 
to its high computational expenses, is restricted to rather short peptides and proteins p, . 
In this situation, the AFM jj] or optical tweezers 5|] have been proved a very useful tool 
for probing the FEL of proteins. In most experiments , assuming that mechanical 

unfolding is well described by a two-state scenario, one can extract the distance between 
the native state (NS) and the transition state (TS), as well as the unfolding barrier AG^. 



Latest theoretical studies 
reasonable estimates for those quantities 
In recent experimental works 



10| showed that simple coarse-grained models can provide 
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12|, the FEL of three-state proteins has been studied. A 



schematic plot of the FEL for this class of proteins is shown in Fig. [TJ Single molecule force 
measurements allow the distance Xui between the native and the first transition state (TSl) 
and the distance Xu2 between the intermediate state (IS) and the second transition state 
(TS2) to be evaluated. The unfolding barriers AGJ = Gtsi — G^s and AGI = Gts2 — Gjs 
can also be determined. As far as we know, there were no theoretical attempts to calculate 
these important parameters. Therefore, the goal of this work is to develop a new method 
for computing them from simulations. Using the Go model 13| and our new method, we 
calculated the free energy landscape parameters for the three-state titin domain 127 and the 



domain DDFLN4. Our results are in reasonable agreement with the experiments 111. Il2|. 



II. METHOD 



The new method is based on the fact that the end-to-end extension, AR, is a good reaction 
coordinate for describing mechanical unfolding. It should be noted that the direction of the 
force and therefore AR is changing upon partial unfolding of a protein. The time Tui for a 
molecule to go from the NS to the IS is defined as a time needed to achieve the intermediate 
point AR* between the TSl and TS2 (Fig. [1]). The time to cross the TS2 barrier starting 
from the IS, r„2, is a time to reach the rod state starting from AR*. We propose to determine 
AR* from either constant-force or constant loading rate pulling simulations. Using the force 
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dependencies of Tui and r„2, and the Bell formula we can extract Xui and Xu2- The 
unfolding barriers AG\ and AG^ can be estimated in the framework of the microscopic 
kinetic theory 151. 

n 

To illustrate our method, we used the Go model [13|, which is an appropriate choice 
not only because the construction of FEL for long enough biomolecules by all-atom models 
is beyond present computational facilities, but also because unfolding of biomolecules is 
mainly governed by the native conformation topology 8|. Moreover, it has been recently 
shown j^, [l^ that the Go model provides reasonable estimates for Xu and AG^ in the two- 
state approximation and is expected to be applicable to three-state molecules. Details of 
the Go model are given in our previous works jy, [l^. We use the same set of parameters as 



for Go modeling of ubiquitin 



161]. The main computations were carried out at T = 285 K 



= O-bSen/kB, where ks is the Boltzmann constant, and en = 0.98kcal/mol is the hydrogen 
bond energy. The friction 70 was chosen to be the same as for water, i.e. 70 = 50^ 17|, 
where tl = {maO? / enY^"^ ~ 3 ps. Here the characteristic bond length between successive 
beads a ~ 4A and the typical mass of amino acid residues ^ 3 x 10~^^ g jx^j. For the 
water viscosity, one can neglect the inertia term and use the Euler method with the time 
step At = O.lr^ to solve the corresponding Langevin equation. 

Since native titin is a highly heterogeneous polymer, in experiments, one used a poly- 
protein composed of identical tandem repeats of the Ig module (I27i2 [ill, Q, Q) to study 
elastic properties of a single domain at high solution. In the DDFLN4 case, a single DDFLN4 
domain is sandwiched between Ig domains 127-30 and domains 131-34 from titin 2^. Un- 
folding of DDFLN4 can be easily studied, as its mechanical stability is much lower than that 
of Ig-domains (DDFLN4 would have lower peaks in the force-extension curve). In AFM ex- 
periments, one end of a poly-protein is anchored to a surface and the another end to a tip 
of a very soft cantilever. The molecule is stretched by increasing the distance between the 
surface and the cantilever as the external force acts on one of termini via the tip. Because 
a poly-protein mechanically unfolds domain by domain, one can consider that one end of 
a single domain is anchored and the force is ap plie d to the another one. Therefore, as in 



all-atom steered molecular dynamics simulations 



2l| , when the force is ramped linearly with 



time, we fix the N-terminal and pull the C-terminal by applying a force / = K^{vt — r 



Here r is the displacement of the pulled atom from its original position 



2l| . and the spring 



constant Kr is set to be the same as in the harmonic term of the potential energy for the 
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studied system |l3l. Il6l|. The pulling direction was chosen along the vector directed from the 
fixed atom to the pulled one. The pulling speed was set equal to v = 3.6 x 10^ nm/s, which 
is about 3-4 orders of magnitude faster than those used in experiments. In the constant 
force simulations, we add the term —fR to the total energy of the system, where R is the 
end-to-end distance, and / the force applied to the both termini. 

We define the unfolding time r„i as an average of the first passage times needed to reach 
the extension AR*. Different trajectories start from the same native conformation, but with 
different random number seeds. The unfolding time Tu2 is an average of the first passage 
times needed for the molecule to achieve the rod state starting from the extension AR*. In 
order to get a reasonable estimate for r„i and r„2, we have generated 30 - 50 trajectories for 
each value of /. 



III. RESULTS 



The crucial point in our method is how to determine AR*. We will show that both 
simulation ways specified above are valid for this purpose. 



Determination of AR* for 127 



Fig. [2^ presents the force-extension curve obtained at the speed v = 3.6 x 10^ nm/s. In 
accordance with the experiments 19| and all-atom simulations 2]|, we observe two peaks, 
which ascertain that unfolding proceeds via intermediates (if the protein unfolded without 
intermediates, a single peak would be observed). The first peak, located at ARmaxi ~ 8 
A, occurs due to a detachment of strand A (Fig. [2^) from the protein core. One can show 
that the appearance of the second peak at ARraax2 ~ 78 A is related to full unfolding of 
strands A', F, and G. Assuming that AR* is a middle point between two local peaks, we 
have AR* = {ARmaxi + ARmax2)/2 ^ 43 A. 

We now intend to show that AR* can also be determined from constant-force simulations. 
As is evident from Fig. [2](b), two plateaus occur at ARpi ^ 6 A and ARp2 ~ 78 A in the 
time dependence of AR(t). Within error bars, locations of plateaus coincide with those for 
peaks shown in Fig. [2^a). Again, the occurrence of two plateaus indicates that this domain 
mechanically unfolds in a three-state manner. Since the plateaus are related to crossing the 
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unfolding barriers AGl and AG 2, the middle point AR* between the TSl and TS2 can be 
identified as a middle point between two plateaus, i.e. AR* ^ {ARpi + ARp2)/2. For 127, 
AR* PS 42 A, which is close to the result, obtained by constant- velocity pulling simulations. 



Determination of AR* for DDFLN4 



Fig. [3h shows the force-extension profile for the same pulling speed as in the 127 case. 
Two peaks appear at ARmaxi ~ 14 and ARmax2 ~ 92 A. Using these values, we obtain 
AR* = (ARjnaxi + ARmax2)/'2 = 53 A. The existence of intermediates is also evident from 
the constant force simulations which give two plateaus at ARpi ^ 7 K and ARp2 ~ 85 A 
(Fig. [3b). Therefore, AR* = {ARpi + ARp2)/2 ^ 46A which is not far from the value 
followed from the force-extension curve. Since the peaks are not sharp and the plateau 
position fluctuates, one can consider that two simulation modes gave the same result. We 
will use the averaged value AR* = 50A for computing Tui and Tu2 for DDFLN4. 

In accordance with the experiments 12, |20[ , the Go model captures the overall behavior 
of DDFLN4 that it mechanically unfolds via intermediates. However, the location and 
structure of Go intermediates are different from the experimental ones. In the experimental 
force-extension curve j2o|, two peaks occurs at ARpi ^ 150A and ARp2 ~ 310A which are 
very different from our results (Fig. [3ti). Using loop mutations [20|, it was suggested that 
during the first unfolding event (first peak) strands A and B detach from the domain and 
unfold. Therefore, strands C - G form a stable intermediate structure, which then unfolds 
in the second unfolding event (second peak). In order to sort out intermediates in the Go 
model, we plot fractions of native contacts formed by each strand with the rest of the protein 
as a function of AR (Fig. |4^). Here, we present the results obtained for the case when the 
N-terminal is kept fixed and the force is applied to the C-terminal with the same loading rate 
as in Fig. [3ti. At the position of the first peak in the force-extension curve {AR = lAA ), 
strand G fully unfolded, while strands A and B are still structured (Fig. |4|l). Thus, contrary 
to the experiments I2I, [2^, Go intermediate conformations consist of six strands A-F. A 
typical snapshot of intermediates is shown in Fig. |4)3, where all contacts between G and 
F are broken, but a single contact between A and F remains intact. At the second peak 
position {AR = 92A ) denoted by an arrow in Fig. S^, together with G, strand F becomes 
unstructured and most of contacts of strand C are lost. The number of contacts of strands 
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A, B, D and E drops drastica' 
event. As in the experiments 



ly as the protein unfolds quickly after this second unfolding 
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20], at AR ^ 150A , strands A and B are detached from 



the core, but in our Go model, strands F and G have already unfolded. 
From Fig. S^, we obtain the following unfolding pathway for DDFLN4: 

G^F^ A^B^ (E,D,C). (1) 

In addition to this dominant pathway, other routes to the rode state are also possible (Fig. 
Mechanical unfolding pathways may be different, but they share a common feature that 



strand G always unfolds first. This also contradicts the experimental suggestion (20| that 
unfolding initiates from the N-terminal. It is not entirely clear, why the Go model gives dif- 
ferent unfolding pathways and intermediates compared to the experiments. Presumably, the 
discrepancy comes from the simplification of Go modeling, where the non-native interaction 
and the effect of environment are omitted. All-atom simulations are required to clarify this 
issue. 

Calculation of free energy landscape parameters 

Once AR* is found, one can compute the times r„i and as the functions of the external 
force / and extract Xui and Xu2, using the Bell equation |l4| : 

Tui = r^iexpi-fxui /kBT),i = 1,2. (2) 

We have tried several values of AR* in the interval AR* = (42±15) A, and AR* = (50±15)A 
for 127 and DDFLN4, respectively. Since the results remain essentially the same, we will 
present those obtained for A_R* = 42 A, and A_R* = 50A for 127, and DDFLN4, respectively. 
Fig. [5] shows the force dependencies not only for Tui and Tu2, but also for the full unfolding 
time Tu = Tui + Tu2- Strictly speaking, the formula = r^i + r„2 is valid if the probability 
of missed unfolding events is negligible. This happens when the applied force exceeds several 



pN, but not in the / ^ limit l2^ 1. Since our computations were carried out at f of tens 
pN (Fig. {5^, the mentioned above equality is applicable to extract r„. 

For 127, Tui is about 2-3 times larger than Tu2- It is also evident from Fig. MJo), which 
demonstrates that the second plateau exists during shorter time intervals than the first one. 
A similar situation happens for DDFLN4, but, for high forces, r„2 becomes eventually larger 
than r„i (Fig. Mh)). 
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In the low-force regime, fitting the data by Bell Eq. ([2]) (straight lines in Fig. [5]), we 
obtain Xui, and Xu2 quoted in Table. For both 127 and DDFLN4 our estimate of Xu2 agrees 
very well with the experiments, while that for bit higher than the experimental 

data. Given the simplicity of the Go model, the agreement between the theory and the 
experiment should be considered reasonable, but it would be interesting to check if a more 
comprehensive account for non-native contacts and environment could improve our results. 

It is to be noted here that although the Go model gives a different location of intermediates 
in the DDFLN4 force-extension curve in comparison with the experiments, it still provides 
reasonable estimates for Xui and Xu2- This is because the nature of Xui and Xu2 is different 
from that of the end-to-end distance: the former are a measure of the force dependence of 
barrier crossing rates, while the later is a real distance. Nevertheless, one has to be careful 
in comparison of Go results with experiments on DDFLN4. 

In the Bell approximation, one assumes that the location of the transition state does 
not move under the action of an external force. However, our simulations for ubiquitin. 



e.g., showed that it does move toward the NS 16|. Recently, assumin g th at Xu depends on 
the external force and using the Kramers theory, several groups 15|, [22] have tried to go 



beyond the Bell approximation. We follow Dudko et al. who proposed the following force 



dependence for the unfolding time 15|: 



biding barrier, and v = 1/2 and 2/3 for the cusp 23| and the linear-cubic 



Here, AG^ is the un 

free energy surface 2j], respectively. Note that u = 1 corresponds to the phenomenological 
Bell theory (Eq. ([2])). An important consequence following from Eq. ([3]), is that one can 
apply it to estimate not only Xu, but also G^, if u ^ 1. Since the fitting with u = 1/2 is valid 
in a wider interval as compared to the u = 2/3 case, we consider the former case only. The 
region, where the u = 1/2 fit works well, is expectedly wider than that for the Bell scenario 
(Fig. ([5])). However, for DDFLN4 this fit can not cover the entire force interval. 

In the 127 case, from the nonlinear fitting (Eq. (I3l) and Fig. [St^a)), we obtain Xui = 4.7 A, 
and Xu2 = 5.1 A , which are larger than the Bell theory-based estimations (see Table). Using 
raw experimental data [l^ and fitting with u = 1/2, in two-state approximation, Dudko 



et al. 



15 | obtained Xu = 4:A-„ which is close to our result. For DDFLN4, the nonlinear fit 



(Fig. M)^)) gives Xui = 13.1 A, and Xu2 = 12. SA which are about twice as large as the Bell 



theory-based estimates (Table). Such high values of Xu are typical for a-proteins and they 
may come from the low resistance of DDFLN4 because the less stable protein, the larger 
is Xu Iq]. Recently, Dietz and Rief 2^ have shown that the product fuXu ~ 50 pN nm is 



22| and 



probably universal for all proteins. Using the experimental result ~ 45 pN 
Xu ~ 13A , we obtain fuXu ~ 59 pN nm which is not far from this universal value. From 
this point of view, big values of Xui and Xu2 are still acceptable, but additional experimental 
data are required to settle this problem. 

Theoretical values for G\, and G\, followed from Fig. [5l are listed in Table. To estimate 
the unfolding barriers from the available experimental data for 127 [llj] and DDFLN4 12, 22|. 
we used the following formula: 



AG^ = -kBT\n{TA/T^), 



(4) 



where r° denotes the unfolding time in the absence of force, and ta is a typical unfolding 
prefactor. Since ta is not known for unfolding, we used the value typical of folding, ta = 



0.1/is 



26 



27 



28 



For 127, we used r°2 



(3)^1 X lO^s and total unfolding time r° = (2.9)"^ x lO^s [11|. 



is extracted as r°i = t° — t^2- Using these unfolding times and Eq. Cj, and G| 
were calculated (Table). The best agreement between theory and experiment is obtained 
for G^. Interestingly, using the similar fitting procedure and raw experimental data 18|. 



in two-state approximation, Dudko et al. 
result. 



15| obtained G^ = 20k bT, which is close to our 



In the DDFLN4 case, we used = (0.28)~^s and r°2 = (0.33)-^s of [l2| to estimate 
GI and GI Our resuh for AGI agrees well with the experimental data, but the theoretical 
value for AG\ turned out higher than the experimental one. This disagreement may be 
due to the limitation of the Go model. An another possible reason is that the experimental 
estimations were obtained using the same prefactor ta = O.l/xs for all cases and this might 
be invalid. 

To conclude, we have proposed a new simulation approach to delineate the FEL of multi- 
state proteins. Our method is simple to use, and it does not require any extra CPU cost, 
because the unfolding times r^i, and r„2 are computed in a single run for every trajectory. 
Using this method and the simple Go model, we obtained Xui, and Xu2, which are in reason- 



able agreement with the experimental data for 127 



nj and DDFLN4 [jj, 12^. There is a 
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discrepancy between theoretical and experimental estimations for some unfolding barriers. 
Therefore, it would be useful to go beyond the Go model to see if one could obtain better 
agreement. Our method is universal and may be applied to other multi-state biomolecules. 
The work in this direction is in progress. One can also extend our approach to the case of 
folding under quenched force for computing folding barriers. 
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visits to Warsaw. MSL was supported by the Ministry of Science and Informatics in Poland 
(grant No 202-204-234). 
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x,i(A) Xu2{k) AGl/kBT AGl/kBT 



127 


Theory 


3.2 ± 0.1 


3.0 ± 0.1 


16.9 


17.0 




Exp. [11] 


2.2 


3.0 


20.9 


24.2 


DDFLN4 


Theory 


6.1 ± 0.2 


5.1 ± 0.2 


25.8 


18.7 




Exp. [12,22j 


4.0 ±0.4 


5.3 ± 0.4 


17.4 


17.2 



Table. Parameters Xui, and Xu2 were obtained in the Bell approximation. Theoretical 
values of the unfolding barriers were extracted from the microscopic theory of Dudko et al 
jisl (Eq. ([3])) with u = 1/2, while their experimental estimates were obtained using Eq. (j4]) 
and ta = O.l/xs. 
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Figure Captions 



FIGURE 1. Schematic plot of the free energy landscape for a three-state protein as a 
function of the end-to-end distance, x^i and refer to the distance between the NS 
and the first transition state (TSl) and the distance between the intermediate state (IS) 
and the second transition state (TS2). The unfolding barrier AG| = Gtsi ^ G^s and 
AGI = Gts2 - Gis- a midpoint between TSl and TS2 Ai?* can be determined from 
simulations. 

FIGURE 2. (a) The force-extension curve at the loading rate v = 3.6 x 10^ nm/s for 
127. The results are averaged over 100 trajectories. The arrows indicate the position of 
^Rmaxi ~ 8 A and ARmax2 = 78 A. The vertical solid line refers to AR* ^ 43 A. The 
inset shows the native state structure of 127 (PDB ID: ITIT), which contains seven (3- 
strands labeled as A to G. (b) The time dependence of the end-to-end extension for 10 
representative trajectories. Dashed lines refer to the first and the second plateau, which are 
located at ARpi ^ 6 A and ARp2 ~ 78 A, respectively. The solid straight line corresponds 
to AR* = 42 A. T = 285 K and / = 75 pN. 

FIGURE 3. (a) The same as in Fig. [2^ but for DDFLN4. The results are averaged over 
100 trajectories. The arrows indicate the position of ARmaxi = 14 A and ARmax2 = 92 
A. The dashed line refers to AR* ^ 53 A. The inset shows the native state structure of 
DDFLN4 taken from PDB (PDB ID: IKSR). There are seven /5-strands: A (6-9), B (22-28), 
C (43-48), D (57-59), E (64-69), F (75-83), and G (94-97). In the native state, there are 15, 
39, 23, 10, 27, 49, and 20 native contacts formed by strands A, B, C, D, E, F and G with the 
rest of the protein, respectively. The end-to-end distance in the native state R^s = 40.2 A. 
(b) The same as in Fig. [2)3 but for DDFLN4. Dashed lines refer to the first and the second 
plateau, which are located at ARpi ^ 7 Aand ARpi ^ 85 A, respectively. The solid straight 
hue corresponds to AR* = 46 A. T = 285 K and / = 75 pN. 

FIGURE 4. (a) The dependence of fractions of native contacts on the end-to-end extension 
for DDFLN4. The results were obtained from the constant loading rate pulling simulations 
with V = 3.6 X 10^ nm/s as in Fig. [3^. Arrows refer to positions of the peaks in the 
force-extension curve in Fig. [3^. (b) A typical snapshot at AR ^ 14 A. A single contact 
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between strands A and F is not broken (solid line), while all 11 native contacts between 
strands F and G are already broken (dashed lines). Note that for the cutoff = 6.5 A , 
there is only one contact between A and F in the native state, (c) Shown are probabilities 
of unfolding pathways Pufpw for seven /5-strands. The values of Pufpw are written on top of 
the histograms. Results were averaged over 100 trajectories. 

FIGURE 5. (a) The force dependencies of unfolding times for (circle), t^i (squares) 
and Tu2 (diamonds) for 127 at T = 285 K. The straight black lines refer to linear fits in the 
Bell approximation for r„i and r„2. The red curves correspond to the fitting by Eq. ([3]) with 
V = 1/2. The unfolding barriers, followed from this non-linear fit, are listed in Table, (b) 
The same as in (a) but for DDFLN4. The fit curves go up at high forces, where the Eq. ([3]) 
is no longer valid |l^. 
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